| title: “Demystifying the Carnegie Classifications” |
| author: “Paul Harmon” |
| date: “April 17, 2017” |
| output: |
| beamer_presentation: |
| toc: true |
| theme: “AnnArbor” |
| colortheme: “crane” |
| fonttheme: “structurebold” |
Montana State (R-2), Stanford (R-1), Boise State(R-3):
2015 Carnegie Classifications
Nearest Institutions to Montana State
Goal: To reduce p predictors into k components via eigenvalue decomposition. PCA can be done on unscaled raw data or on a scaled covariance matrix.
Given p predictors \(x_1, x_2,...x_p\), we can generate via an eigenvalue decomposition of X a set of p new variables \(y_1,y_2,...,y_n\). The \(y\)’s are ordered so that \(y_1\) explains the most variation in the underlying \(x\)’s, and \(y_p\) the least.
Scores: The new set of covariates. These are functions (weighted averages) of the old covariates. Loadings: The loadings give the formula used to calculate the scores from the original covariates.
But how do we do dimension reduction? Since we know how much variation in \(x\) is explained by each \(y\)-score, we can make a new factor matrix of some subset of the scores. The Carnegie Classifications use only the first score from each PCA run.
The classifications are calculated based on two indices of institutional output. The first is based on a weighted average of the number of PhDs awarded by the institution; the second is based on a per-capita measurement of research expenditures and research staff. Aggregate Index: \[Ag.Index_{i} = HumanitiesPhD_{i} + StemPhD_{i} + SocialSciencePhD_{i} + OtherPhD_{i} + StemExpenditures_{i} + NonStemExpenditures_{i} + ResearchStaff_{i} \] Per Capita Index: \[PC.Index_{i} = \frac{ResearchStaff_{i} + StemExpenditures_{i} + NonStemExpenditures_{i}}{FacultySize_{i}} \]
The Carnegie Classifications were based on a number of seemingly arbitrary choices. Why use the minimum rank instead of the average? Why un-scale prior to drawing the lines between groups? Why draw lines where they drew them? Rather than clustering on the first PC score for two indices, why not include the first two PC scores? +
Text goes here I think